prov/efa: Refactor AV to split it for efa/efa-direct#12222
Conversation
Add a shared helper that checks whether an efa_ep_addr has a non-zero GID. This validation is needed by both the base AV insert path (efa_av.c) and the protocol AV insert path (efa_proto_av.c) to reject all-zero addresses. Placing it in efa_av.h alongside the efa_ep_addr definition avoids duplicating the check in both files. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
Add rdm/efa_proto_av.h containing the struct definitions for the protocol AV layer. The header defines efa_proto_av_entry as a flat layout AV entry with ep_addr, ah, fi_addr, implicit_fi_addr, peer map, shm_fi_addr, implicit-AV LRU linkage, AH implicit-conn linkage, and a back-pointer to the owning AV. efa_proto_av embeds struct efa_av as its first member so a base-layer efa_av pointer can be recovered via container_of. efa_proto_av_entry_ep_peer_map_entry describes a single (endpoint, peer) mapping stored on each entry's per-entry peer hash. Header-only commit — no implementation is added here. The structs are referenced by the efa_proto_av.c implementation added in a subsequent commit, and the protocol-AV-aware peer switch several commits later changes efa_rdm_peer's backing object from struct efa_conn * to struct efa_proto_av_entry *. Introducing the types ahead of their first use lets each later commit focus on a single concern. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
efa_av_init_util_av currently hard-codes util_attr.context_len to sizeof(struct efa_av_entry) - EFA_EP_ADDR_LEN. Add a context_len parameter and update the sole caller to pass this value explicitly. This prepares for efa_proto_av_open (added in a later commit) to call efa_av_init_util_av with a different context_len reflecting struct efa_proto_av_entry's size. No behavior change. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
Introduce two base-layer helpers on struct efa_av_entry so callers no longer need to reach through the embedded struct efa_conn to get at the entry's identity or its ep_addr byte array. efa_av_addr_to_entry wraps the existing efa_av_addr_to_conn with a container_of so it returns the enclosing efa_av_entry directly. efa_av_entry_ep_addr returns a typed struct efa_ep_addr * view over the ep_addr[] byte array stored at the front of the entry. Both are inline. Callers in the base path can use these helpers today; more importantly, when the protocol-AV variant struct efa_proto_av_entry is added later it will have the same leading byte array but no conn pointer, so the byte-array accessor makes it possible to share reverse-AV helpers between the two entry types without a type shim. This commit also promotes efa_av_init_util_av from file-local to a public declaration in efa_av.h so the protocol AV implementation, added in a subsequent commit, can call it. Purely additive — no existing callers are migrated in this commit. Subsequent commits update base-path callers to use the new helpers. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
Update efa_msg.c, efa_rma.c, efa_base_ep.c, and efa_domain.c to use the new efa_av_addr_to_entry() / efa_av_entry_ep_addr() helpers instead of efa_av_addr_to_conn() / conn->ep_addr. Reads are still routed through the embedded conn for ah (av_entry->conn.ah) because struct efa_conn is still embedded in efa_av_entry. A later commit strips the conn embedding and collapses these accesses to av_entry->ah directly. Mechanical rename; no behavior change. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
…av_entry * The reverse-AV helpers previously took struct efa_conn *. Change them to take struct efa_av_entry * so they can be shared across the base (efa-direct) and protocol (RDM, added in a later commit) layers without a type shim. Update internal field reads to go through the byte-array accessor efa_av_entry_ep_addr() rather than conn->ep_addr pointer deref. The ep_addr pointer and the byte array reference the same storage, so behavior is unchanged — but efa_av_entry_ep_addr() also works for struct efa_proto_av_entry (the protocol's flat layout variant added later), which lacks an ep_addr pointer field. Store struct efa_av_entry * in the efa_cur_reverse_av / efa_prv_reverse_av structs instead of struct efa_conn *. Update callers in efa_av.c and efa_conn.c using container_of() to pass the enclosing efa_av_entry when they only have a conn pointer. No behavior change. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
Add the full efa_proto_av.c implementation as a single unit. Protocol AV concentrates all RDM-specific AV concerns in one file: implicit AV insert and release, implicit-to-explicit migration, per-endpoint peer map, implicit-AV LRU tracking, SHM AV insertion, connid-aware reverse lookup, and peer allocation tied to entry lifetime. Moving these responsibilities out of efa_av.c / efa_conn.c is what lets the base AV shrink to an efa-direct-only layer later in this series. This commit drops the file in but deliberately leaves it unwired. fi_ops_domain_rdm.av_open still points at efa_av_open, so nothing in the RDM domain path reaches any code in efa_proto_av.c yet. Functions are given external linkage so the library still links. A later commit registers efa_proto_av_open as the RDM domain's av_open, switches struct efa_rdm_peer's backing object from struct efa_conn to struct efa_proto_av_entry, and updates all RDM callers in one atomic step. One interim-state detail: the newly added pke peer-fi_addr callback reads peer->conn->fi_addr because struct efa_rdm_peer still carries a struct efa_conn * at this point. When the peer switch lands, that single expression becomes peer->av_entry->fi_addr along with the rest of the peer-type change. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
efa_rdm_ep_destroy_buffer_pools tests whether a peer's explicit fi_addr is populated; if not, falls through to the implicit AV. The test was written with FI_ADDR_UNSPEC, which is semantically "unspecified peer address" (used by e.g. fi_recvfrom for any source) rather than "this slot has no valid fi_addr". FI_ADDR_UNSPEC and FI_ADDR_NOTAVAIL are both defined as (uint64_t)-1 in fabric.h, so the runtime behavior is identical. This is a naming correction, not a behavior change. Every other site in the provider that tests the same peer->conn->fi_addr / ->implicit_fi_addr condition uses FI_ADDR_NOTAVAIL (e.g. efa_rdm_cq.c, efa_proto_av.c, efa_av.c). Bring this site in line for consistency. No behavior change. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
…o_av_entry This is the atomic switch that moves the RDM protocol layer off of struct efa_conn and onto struct efa_proto_av_entry. Because the peer's backing object changes type and every caller that reaches through a peer pointer must be updated in lockstep, splitting this work across multiple commits would leave the tree uncompilable between them. At the domain level, fi_ops_domain_rdm.av_open is set to efa_proto_av_open so RDM domains instantiate an efa_proto_av instead of the base efa_av. A proto_av field is added to struct efa_rdm_ep and populated in efa_rdm_ep_bind via container_of(av, struct efa_proto_av, efa_av); caching it on the endpoint avoids repeating the container_of on the hot path. struct efa_rdm_peer's backing object changes from struct efa_conn * to struct efa_proto_av_entry *. The field is renamed from conn to av_entry to reflect the new type, and efa_rdm_peer_construct's signature is updated to take the new pointer. All RDM sources that dereferenced peer->conn are updated to peer->av_entry: efa_rdm_cq.c, efa_rdm_msg.c, efa_rdm_ope.c, efa_rdm_pke*.c, efa_rdm_util.c, efa_domain.c, and efa_rdm_ep_utils.c. The efa_rdm_ep_get_peer_* helpers in efa_rdm_ep_utils.c are migrated from efa_av_addr_to_conn / conn-based peer-map lookup to efa_proto_av_addr_to_entry and efa_proto_av_entry_ep_peer_map_lookup. efa_rdm_ep_get_explicit_shm_fi_addr moves out of efa_rdm_ep.h into efa_rdm_ep_utils.c. Keeping the accessor inline in the header would force efa_rdm_ep.h to include efa_proto_av.h, and efa_proto_av.h depends transitively on efa_rdm_peer.h; an out-of-line definition keeps the include graph acyclic. The lone pke callback introduced with efa_proto_av.c that still read peer->conn->fi_addr is flipped to peer->av_entry->fi_addr as part of this commit — it is the single expression that had to wait for the peer-type switch to land. efa_rdm_ep_destroy_buffer_pools similarly migrates to the efa_proto_av_entry peer-map entry type. efa_ah_implicit_av_evict_ah in efa_ah.c is updated to iterate efa_proto_av_entry on the AH LRU list and call efa_proto_av_entry_release_ah_unsafe. This function is relocated to efa_proto_av.c entirely in the subsequent AH-layering commit; the update here keeps the intermediate state functional. Test mocks and the efa AV test file are updated for the new peer field name, the new efa_rdm_peer_construct signature, and the new peer-map entry type. test_av_verify_av_hash_cnt gains a struct efa_proto_av * parameter so it can reach both the base efa_av hash counters and the protocol-only implicit-AV fields through a single call. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
efa_ah_alloc acquires util_domain.lock before looking up the AH in the domain's hash map. If the subsequent malloc for a new efa_ah fails, the function sets errno to FI_ENOMEM, logs a warning, and returns NULL without releasing the lock. The caller has no way to know the lock is still held, so every subsequent operation that touches util_domain.lock (including the caller's own cleanup path and any other thread trying to allocate an AH) deadlocks or UBs. Release util_domain.lock on the malloc-failure path before returning NULL, matching the two other failure paths in the function that do the unlock via err_free_efa_ah. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
efa_ah.c reached into protocol-layer types to run AH eviction on ENOMEM: it iterated an AH's implicit_conn_list as struct efa_proto_av_entry and called efa_proto_av_entry_release_ah_unsafe to drop the associated AV entries. Those types are defined in rdm/efa_proto_av.h, which efa_ah.c picks up transitively through efa.h; the base AH translation unit should not be aware of them. The circularity shows up in practice whenever the protocol layer grows a new concern — every change to efa_proto_av_entry forces a recompile of efa_ah.c and risks dragging protocol-only behavior into the efa-direct build. The fix splits AH allocation into a base path and a protocol path. efa_ah_alloc stays in efa_ah.c but returns NULL on ENOMEM without attempting eviction; its only responsibilities are now creating the ibv_ah, populating gid/ahn/refcnt, and inserting into the domain's AH hash. The pre-existing EINVAL diagnostic (pretty-printing local and remote GIDs plus the PD pointer for the common cross-AZ / invalid-GID / invalid-PD causes) is preserved in the base path. A new efa_proto_ah_alloc is added in efa_proto_av.c. It calls efa_ah_alloc and, on ENOMEM, invokes efa_proto_ah_evict to release the AH at the head of the LRU whose explicit_refcnt is zero, then retries. The eviction logic itself is lifted wholesale from efa_ah.c into efa_proto_av.c, where it belongs alongside the other implicit-AV bookkeeping. efa_ah_destroy_ah is promoted from a file-local function prototype to a public declaration in efa_ah.h so efa_proto_ah_evict can call it after dropping the last reference. efa_ah.c no longer references efa_proto_av_entry. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
The efa_ah struct contained protocol-specific fields (implicit_refcnt, explicit_refcnt, implicit_conn_list, domain_lru_ah_list_entry) that are only used by the RDM protocol path for implicit AV management and AH LRU eviction. The efa-direct path only needs the ibv_ah, GID, AHN, and a reference count. This commit separates them following the same pattern used for efa_av / efa_proto_av. The base efa_ah struct is stripped to: gid, ibv_ah, ahn, refcnt, and the hash handle. The split implicit/explicit refcounts are replaced with a single refcnt. efa_ah_alloc now takes an alloc_size parameter instead of bool insert_implicit_av, allowing the protocol layer to allocate a larger efa_proto_ah. efa_ah_release no longer takes a bool parameter — it simply decrements refcnt and destroys at zero. The base layer has no knowledge of LRU lists, implicit AV entries, or eviction. A new efa_proto_ah struct is added to efa_proto_av.h, embedding efa_ah as its first member (castable via container_of). It adds the protocol-specific fields: implicit_refcnt, explicit_refcnt, implicit_conn_list (list of implicit AV entries using this AH), and lru_list_entry (position in the domain's AH LRU list). efa_proto_ah_alloc in efa_proto_av.c allocates sizeof(efa_proto_ah) via efa_ah_alloc, initializes the protocol fields for new AHs, manages the implicit/explicit refcount split, maintains the LRU list, and handles ENOMEM with eviction retry. efa_proto_ah_release decrements the appropriate protocol refcount and removes from the LRU list when both reach zero before calling efa_ah_release to decrement the base refcount. The efa_proto_ah_lru_move function (formerly efa_ah_implicit_av_lru_ah_move in efa_ah.c) is now static in efa_proto_av.c, operating on the efa_proto_ah lru_list_entry. The efa_proto_ah_from_ah inline accessor is added to efa_proto_av.h for converting base efa_ah pointers to their protocol wrapper. Update efa_conn.c and efa_av.c to access the moved fields (implicit_refcnt, explicit_refcnt, implicit_conn_list) through efa_proto_ah_from_ah(). These files contain the old implicit-AV code paths that remain alive until the next commit strips efa_av and deletes efa_conn. Update efa_ah_alloc/release callers in these files for the new signatures. Split the efa_ah_cnt_av_impl test helper into two path-specific variants: efa_ah_cnt_av_efa_impl uses efa_proto_ah_from_ah() to check the protocol refcnts (implicit/explicit), while efa_ah_cnt_av_efa_direct_impl checks the base efa_ah refcnt directly. The combined helper would have dereferenced efa_proto_ah fields past the end of a base-sized efa_ah allocation on the efa-direct path; splitting the helpers removes the runtime branching and the out-of-bounds access. Test mocks are updated for the new efa_ah_alloc (size_t alloc_size) and efa_ah_release (no bool) signatures. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
With every protocol-specific AV concern now living in efa_proto_av.c, remove the remnants from the base AV layer. efa_av.h and efa_av.c are reduced to what the efa-direct path actually needs: the efa_av struct now contains only domain, used, type, cur_reverse_av, prv_reverse_av, and util_av. All implicit-AV fields, LRU tracking, peer-map logic, SHM AV handling, and implicit reverse-lookup functions are gone from this file. struct efa_av_entry becomes a flat 48-byte record of ep_addr[32], a struct efa_ah *ah, and fi_addr_t fi_addr. struct efa_conn is no longer embedded, and the base path's callers switch from av_entry->conn.ah to av_entry->ah directly. The _Static_assert that previously tied efa_av_entry.conn.ah and efa_proto_av_entry.ah to a common offset is updated to reference efa_av_entry.ah directly. Two new static helpers efa_av_entry_init and efa_av_entry_release replace the old efa_conn_alloc / efa_conn_release paths for the base-only lifecycle. efa_av_insert_one and efa_av_insert lose their external linkage because they are only called from within efa_av.c now (RDM uses efa_proto_av_insert_one). Drop efa_conn.c and efa_conn.h from prov/efa/Makefile.include and libfabric.vcxproj. The files themselves are left on disk and deleted in a follow-up commit so this commit is strictly a functionality change and the next one is a pure file deletion. Leaving the files out of the build list means their compile errors against the new efa_av / efa_av_entry shapes are harmless — nothing includes or compiles them from this commit forward. Update efa_ah.c to drop the now-unused #include "efa_conn.h". Update the efa_unit_test_proto_av.c test file to call efa_proto_av_reverse_lookup in place of the deleted efa_av_reverse_lookup_rdm. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
The previous commit removed the last references to efa_conn from the tree (both as source and as a build target). The files are no longer compiled or included by anything. Delete them. Pure file deletion — no other changes. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
Split efa_unit_test_av.c into two files: - efa_unit_test_av.c: base AV tests (efa-direct, AH counting, EP binding) - efa_unit_test_proto_av.c: protocol AV tests (implicit AV, LRU, peer map, reverse lookup, implicit-to-explicit migration, AH eviction) The new efa_unit_test_proto_av.c also contains the body of test_av_reverse_av_remove_qpn_collision, whose declaration and registration were added earlier but whose implementation had not been defined. Co-locating it with the other protocol AV tests keeps declaration, registration, and definition in sync. Update prov/efa/Makefile.include to build the new file. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
Add thirteen new unit tests that exercise the refactored AV code paths. The existing suite focused mostly on AH refcount and endpoint-binding behavior; these tests add direct coverage of entry insert/remove/lookup, protocol-AV-only behaviors, and a handful of error paths that previously had no assertions. Base AV coverage adds test_av_insert_remove_lookup_efa_direct, which walks a full insert/lookup/remove cycle against the base efa_av, and test_av_base_addr_to_entry_invalid, which confirms that efa_av_addr_to_entry returns NULL for FI_ADDR_NOTAVAIL and FI_ADDR_UNSPEC inputs. Protocol AV coverage exercises lookup and lifetime of efa_proto_av_entry: test_av_proto_reverse_lookup_explicit covers reverse lookup after insert and remove, test_av_proto_prv_reverse_av covers QPN reuse falling through to prv_reverse_av, test_av_proto_addr_to_entry_after_remove checks the validity bit (ah != NULL) after a remove, test_av_proto_insert_remove_with_peer drives the per-endpoint peer-map lifecycle alongside insert and remove, test_av_proto_batch_insert covers a multi-address fi_av_insert, and test_av_proto_remove_nonexistent covers the error path for a fi_addr that was never inserted. Implicit-AV behavior adds test_av_implicit_to_explicit_peer_updated, which verifies that migrating a peer from the implicit AV to the explicit AV preserves peer and AH identity; test_av_implicit_av_unbounded, which confirms the implicit AV grows without an explicit cap; and test_av_implicit_av_lru_move_single, which exercises LRU move when only a single implicit entry exists. Input validation adds test_av_proto_insert_invalid_address for inserts of all-zero endpoint addresses and test_av_proto_open_unsupported_attrs for fi_av_open with attributes the protocol AV does not accept. Signed-off-by: Seth Zegelstein <szegel@amazon.com>
| * @param[in] addr address to check | ||
| * @return non-zero if valid, 0 if all-zeros | ||
| */ | ||
| static inline int efa_av_is_valid_address(struct efa_ep_addr *addr) |
There was a problem hiding this comment.
Commit message says "add" but it's really being moved
| * ep_addr[32] off=0 — TX hot (qpn@+16, qkey@+20) | ||
| * ah* off=32 — TX hot (EFA send path) | ||
| * fi_addr off=40 — RX hot (explicit peer lookup, CQ poll) | ||
| * implicit_fi_addr off=48 — RX hot (implicit peer lookup, CQ poll) |
There was a problem hiding this comment.
Implicit AV entries are not considered hot path. In the hot path, the application inserts the peer into the AV
I think switching implicit_fi_addr and shm_fi_addr would be better
| * efa_av.prv_reverse_av (off=32) — HASH_FIND fallback for QPN reuse (connid mismatch) | ||
| * These are in cacheline 0 — explicit peer reverse lookup stays in one line. | ||
| * | ||
| * RX hot path for implicit (unknown) peers: |
There was a problem hiding this comment.
Same as above. Let's not call the implicit peer path "hot"
| /** | ||
| * @brief Protocol AV entry — flat layout with same field prefix as efa_av_entry | ||
| * | ||
| * pahole: |
There was a problem hiding this comment.
The layout is nice to have in the comments but we must make sure the comment is up to date
| return NULL; | ||
|
|
||
| entry = (struct efa_proto_av_entry *)util_av_entry->data; | ||
| return entry->ah ? entry : NULL; |
There was a problem hiding this comment.
Shouldn't this be assert(entry->ah)? Based on the util_av removal code below, you set entry->ah to NULL after calling ofi_av_remove_addr which calls ofi_ibuf_free. So we shouldn't ever have the case where ofi_bufpool_ibuf_is_valid returns true but ah is NULL
| #include "efa_data_path_ops.h" | ||
| #include "ofi_util.h" | ||
| #include "efa_av.h" | ||
| #include "rdm/efa_proto_av.h" |
There was a problem hiding this comment.
The rdm/ prefix shouldn't be necessary because this file is also in rdm/ folder. Also happens in multiple other files.
| * ah_implicit_conn_list_entry off=88 — implicit AV insert/release | ||
| * av* off=104 — back-pointer for AH eviction | ||
| */ | ||
| struct efa_proto_av_entry { |
There was a problem hiding this comment.
rdm stands for reliable datagram, and all of efa is reliable datagram.... Its a terrible name, and I think that a pure rename change %s/rdm/proto won't go so well... so I want to do it in bits and pieces.
There was a problem hiding this comment.
I think it will go well if we do s/efa_rdm/efa_proto finally, AFAIU we don't have such combo beyond the code
There was a problem hiding this comment.
How does it hurt us to do it in bits and pieces? I feel like its not unclear as-is with the mixed efa_rdm and efa_proto so I don't understand why this is a problem?
This PR fully transitions the AV component to the new naming scheme (rest of code base is still on old naming scheme, but we are consistent within the component).
There was a problem hiding this comment.
the reason I think s/efa_rdm/efa_proto wont go well is three fold:
- We will essentially need to stop all changes to the EFA provider to let that change through
- Everything in progress will get a bad merge conflict
- We will make git blame slightly harder to use b/c we will effectively change the entire codebase efa-proto with 1 commit, you would have to ignore it to get a useful git blame (file rename + struct rename).
There was a problem hiding this comment.
How does it hurt us to do it in bits and pieces?
Because we will have a mix of efa_rdm and efa_proto prefixes and that makes code search very difficult
There was a problem hiding this comment.
+1 with Sai, and I am ok with the git blame confusion as we should expect no behavior change for both refactor and renaming ...
There was a problem hiding this comment.
meh ok, I will revert the rename when I have some time :(
There was a problem hiding this comment.
as we should expect no behavior change for both refactor and renaming ...
The git-blame issue is that the rename (both code/file) hides the last commit that caused the functional problem, and sets the blame to look at the rename even though it is not funcitonal problem.
Refactor EFA AV layer into base and protocol components
What
Splits the EFA provider's AV layer into base (efa_av) and protocol (efa_proto_av) components, mirroring the existing pattern used elsewhere in the provider. Does the same split for efa_ah. Deletes efa_conn.{c,h}.
Why
Three interrelated design decisions drive this refactor:
Split AV into base + protocol. struct efa_av carried implicit-AV fields (LRU list, implicit util_av, evicted_peers hashset) that efa-direct never uses. struct efa_av_entry embedded struct efa_conn, which bundled RDM-specific concerns (AH lifetime, peer map, SHM insertion, implicit-to-explicit migration). After this refactor, the efa-direct data path uses minimal base structs; all RDM-specific state lives in efa_proto_* wrappers that container_of-embed the base types.
Delete efa_conn entirely. Once struct efa_proto_av_entry exists as the RDM-side AV entry, struct efa_conn is redundant — it was always just"the RDM-specific tail" of struct efa_av_entry. Keeping both would leave the codebase with two types representing the same thing. The cleaner resolution is to absorb efa_conn's responsibilities into efa_proto_av_entry (for protocol behaviors) and efa_av_entry (for base behaviors), then remove efa_conn.c/h outright. struct efa_rdm_peer's backing pointer is retyped from efa_conn * to efa_proto_av_entry * in the same change.
Split AH along the same base/protocol lines. struct efa_ah carried protocol-only state: split implicit/explicit refcounts, an
implicit_conn_list, and an LRU-list entry for AH eviction. None of this matters on the efa-direct path. Worse, efa_ah.c iterated struct efa_proto_av_entry for its eviction logic — a layering violation where the base AH translation unit had a hard dependency on protocol-layer types. The fix is to shrink struct efa_ah to a single unified refcount and wrap it in a new struct efa_proto_ah that holds all the protocol concerns, then move the eviction logic out of efa_ah.c into efa_proto_av.c.
Testing
Every commit individually validated with --enable-debug --enable-asan --enable-efa-mem-poisoning plus the cmocka unit-test suite. 13 new unit tests added covering insert/lookup/remove cycles, reverse-AV behaviors, implicit-to-explicit migration, QPN reuse, and error paths.